Major Project


Introduction to the project

Since the Industrial revolution, humanity has been using fossil fuels to generate electricity. Unfortunately we cannot continue this trend anymore because of Climate Change. We have already emitted so much C02 that we have raised Earth's average temperature by 1.1 degree Celsius.

Since we all know that electricity is the basis of modern life and without it we cannot survive, one of the most important question which we are looking to answer is whether Non Fossils fuels and renewable sources of electricity can meet our electricity power demands 100% by 2050.

We would like to know what goes behind the scenes in order to generate electricity and observe the changes in method of electricity generation trends over a certain time period for almost all of the regions on our planet.

For our project, we have classified Coal, Natural Gas and Oil as dirty fuel sources (because the emit a lot of CO2). Nuclear, Hydro, Wind and Solar have been deemed as Eco-Friendly source because it does not emit any CO2 during its operation.

The main domain of interest for our group is “Sources of Electricity generation”. The reason why we are only focusing only on Electricity aspect is because we utilise more than 1/3rd of our Energy consumption just for Electricity generation.

Reference: - https://www.eia.gov/totalenergy/data/monthly/pdf/flow/total-energy-spaghettichart-2021.pdf


Basic Info

Made By: - Swarnim Khosla

Topic: - Analysis of different methods of electricity generation on the basis of various countries

Dataset Responsibility: - Electricity produced through both Eco-Friendly and CO2 intensive sources of generation.

Note: - Visualisations at the end (Section 10).

Skills demonstarted in the project: - SQL, Python (SQLAlchemy, Pandas, Plotly)


Dataset Sources

Eco-Friendly Fuel Sources

Nuclear

Source: - https://ourworldindata.org/grapher/nuclear-energy-generation?tab=table&country=~OWID_WRL

Hydro

Source: - https://ourworldindata.org/grapher/hydropower-consumption?tab=table

Wind

Source: - https://ourworldindata.org/grapher/wind-generation?tab=table

Solar

Source: - https://ourworldindata.org/grapher/solar-energy-consumption?tab=table

Dirty Fuel Sources

Coal

Source: - https://ourworldindata.org/grapher/electricity-coal?tab=table

Natural Gas

Source: - https://ourworldindata.org/grapher/electricity-gas?tab=table

Oil

Source: - https://ourworldindata.org/grapher/electricity-oil?tab=table

Other Datasets relevant to the topic

Total Electricty Consumed

Source: - https://ourworldindata.org/explorers/energy?tab=table&facet=none&country=USA~GBR~CHN~OWID_WRL~IND~BRA~ZAF&hideControls=false&Total+or+Breakdown=Total&Energy+or+Electricity=Electricity+only&Metric=Annual+generation


Work on Nuclear Dataset

Section 1

What does each column mean?

1) Entity column is basically a "List of Countries" column. 2) Code column tells us the 3 Character code of each country. If a row does not have Code, it means that Entity is NOT a country but a specific region on Earth. 3) Year column tells us the specific year for which that paticular data was collected. 4) Electricity from nuclear (TWh) gives us the electricity generated from Nuclear power in Terra-Watt Hours.

As we can see above, we are given the list of all the Entities which are not countries.

For our project we are only concerned with Countries and not regions. Therefore we shall delete these items.

Discrepancy

Each country is supposed to have a 3 letter Code attached to it.

We can notice that there are some Entities which have more than 3 letter code.

Let us explore that further.

Here we can see that there are 3 peculiar records in our table. Let us see all of these records in its entirety.

Decision

As we can see above, the result set for Kosovo, USSR and the World is given to us.

Reference: - https://en.wikipedia.org/wiki/International_recognition_of_Kosovo

Now we shall look at the list of all of the unique countries who atleast generated some energy from nuclear.

There are only 36 countries! which has ever generated electricity from eco-friendly Nuclear Power.

Now I would like to know the duration and number of years that a specific country used Nuclear to generate electricity.

Caution!

The answer given above is a little bit inaccurate in the sense that the oldest year record available in the .csv file is only from the year 1965.

This may give a false sense that for example United States started producing electricity from Nuclear from year 1965 even though in reality it started doing so from the year 1957.

Reference: - https://en.wikipedia.org/wiki/Nuclear_power_in_the_United_States#Start_of_commercial_nuclear_power

DDL Command

We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.

We can see that our column name has been indeed changed from Entity to Country.

Work on Hydro Dataset

Section 2

Note to the reader

Dear reader, I shall be executing the same 11 queries executed above for the Hydro Dataset and the rest of the energy-datasets as well. New queries are expected to come from Section 8 where I would be taking out Clean energy % by including the Energy.csv.

Skip to section 8, but do look at Inference in order to see different results specific to Hydro Dataset.

The columns of Hydro.csv are similar to that of nuclear.csv. For their interpretation please look before Query 1.

For our project we are only concerned with Countries and not regions. Therefore we shall delete these items.

From the list above, we can confirm that no NULL value is left in the table.

Inference and Comparison

There are 154 distinct countries! which has generated electricity from eco-friendly Hydro Power, which is way higher than nuclear countries (in total of 36 countries).

Work on Wind Dataset

Section 3

Note to the reader

Skip to section 8, but do look at Inference in order to see different results specific to Wind Dataset.

Inference and Comparison

There are 114 distinct countries! which has generated electricity from eco-friendly wind Power, which is way higher than nuclear countries (in total of 36 countries) and lower than Hydro (in total of 154 countries).

Work on Solar Dataset

Section 4

Note to the reader

Skip to section 8, but do look at Inference in order to see different results specific to Solar Dataset.

Inference and Comparison

There are 153 distinct countries! which has generated electricity from eco-friendly Solar Power, which is way higher than nuclear countries (in total of 36 countries) and wind countries (in total of 114) and lower than Hydro (in total of 154 countries).

Work on Coal Dataset

Section 5

Note to the reader

Skip to section 8, but do look at Inference in order to see different results specific to Coal Dataset.

Inference and Comparison

There are 86 distinct countries! which has generated electricity from Dirty-fuel source Coal Power.

Till now we have observed the following: -

Work on Natural Gas Dataset

Section 6

Note to the reader

Skip to section 8, but do look at Inference in order to see different results specific to Natural Gas Dataset.

Inference and Comparison

There are 109 distinct countries! which has generated electricity from Dirty-fuel source Gas Power.

Till now we have observed the following: -

Work on Oil Dataset

Section 7

Note to the reader

Skip to section 8, but do look at Inference in order to see different results specific to Oil Dataset.

Inference and Comparison

There are 192 distinct countries! which has generated electricity from Dirty-fuel source Oil Power.

Till now we have observed the following: -

Work on Total Electricity Generated Dataset

Section 8

Note

We shall now be working with a dataset which tells us the total electricity geneated for a paticular country for a paticular year.

The fuel used for electricity generation in this dataset is Fossil Fuels (Oil, Gas, Coal) + Non Fossil Fuels (Nulcear + Hydro + Solar + Wind).

The columns of energy.csv are similar to that of nuclear.csv. For their interpretation please look before Query 1.

Data Cleaning

We have to do data cleaning for energy.csv as well, just like we did for nuclear and hydro.

Here are the following steps which we are going to execute: -

From the list above, we can confirm that no NULL value is left in the table.

Attention!

The result is not same for the above query as we have seen for nuclear and hydro. Why is this the case? Why is USSR missing? We must investigate. But first we must delete the rows for Kosovo and World. The reason for doing this is given below Query 6.

Investigation for missing value of USSR

After looking at Query 6 and Query 16, It has been discovered that the values of USSR were from 1965-1984. After that we get the values of Post Soviet countries like Russia, Ukraine etc.

We must investigate the Minimum and Maximum values of Year column for Nuclear, Hydro, other fuel sources dataset and Total_Electricity Dataset.

Discrepancy

Since the dataset available for Total Electricity is only available from 1985-2021 as opposed to 1965-2021, therefore we must delete the records in Nuclear, Hydro, Wind, Solar dataset for years 1965-1984.

We are doing this because one of our guiding questions is to take out: -

Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated

Since we only have Total Electricity generated from 1985-2021, we shall now only focus on those 36 years.

Now making sure that the year range in both Nuclear and Hydro table is from 1985 - 2021.

Conclusion

Your Datasets are now more or less synchronised.

Also we also have taken care of the USSR Problem and investigated successfully.

Now we would investigate the total electricity usage of each and every country with duration and average consumption per year.

In the above Query, there is a HAVING Condition: -

HAVING Total_Electricity_Generated <> 0.

Let us now look at the countries for whom we have no data for all of their years.

Since one of our guiding questions is to take out: -

Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated

we cannot have value of Total_Electricity_Generated column as 0. Since anything divided by 0 is undefined.

Therefore we must delete these countries from ALL of the datasets. i.e. Nuclear, Hydro, Total_Electricity etc.

Note

For deleting these countries from Nuclear, Hydro and Total_Electricity datasets, one can run the following queries: -

However due to MySQL Timeout, we have to game the system and perform direct deletion.

We have deleted the undesirable countries from all the 3 datasets. Now synchronisation of the 3 datasets need to happen.

Further Dataset Synchronisation

Section 9

Deep Cleansing

1) All the extra values which are in Nuclear dataset but not in Total Electricity Generated dataset need to be eliminated.

2) All the extra values which are in Hydro dataset but not in Total Electricity Generated dataset need to be eliminated.

3) All the extra values which are in Wind dataset but not in Total Electricity Generated dataset need to be eliminated.

4) All the extra values which are in Solar dataset but not in Total Electricity Generated dataset need to be eliminated.

5) All the extra values which are in Coal dataset but not in Total Electricity Generated dataset need to be eliminated.

6) All the extra values which are in Gas dataset but not in Total Electricity Generated dataset need to be eliminated.

7) All the extra values which are in Oil dataset but not in Total Electricity Generated dataset need to be eliminated.

Why? Because we have to take out the precise value of Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated.

and of Dirty Energy % = (Coal + Gas + Oil)/ Total Electricity Generated.

If for a paticular record we have the values of Nuclear and Hydro but NULL for Total Electricity Generated, then we would run into problems because we cannot divide by NULL.

The above query gives an empty database. Therefore there are no extra values in Nuclear that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Hydro that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Wind that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Solar that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Coal that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Gas that are not in Total Electricity Generated.

The above query gives an empty database. Therefore there are no extra values in Oil that are not in Total Electricity Generated.

Let us confirm the above results by seeing the total number of records for each dataset.

As we can see from above results, The total_electricity has more records than both all of the dataset.

Let us now see the records which are in total_electricity but not in other datasets.

The above records are in total_electricity dataset but not in nuclear dataset.

The above records are in total_electricity dataset but not in hydro dataset.

The above records are in total_electricity dataset but not in wind dataset.

The above records are in total_electricity dataset but not in solar dataset.

The above records are in total_electricity dataset but not in coal dataset.

The above records are in total_electricity dataset but not in gas dataset.

The above records are in total_electricity dataset but not in oil dataset.

Integration and Analysis

Section 10

Integrating the Datasets

Finding country wise trend of usage of Eco-friendly fuel V/S Dirty fuel sources: -

Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated.

Dirty Energy % = (Coal + Gas + Oil)/ Total Electricity Generated.